{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "Lesson_2_SOTA.ipynb",
      "version": "0.3.2",
      "provenance": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ZszpXDZ1oTqm",
        "colab_type": "text"
      },
      "source": [
        "Notebook by Zach Mueller"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "0G7bnbPmH5A6",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "from fastai.vision import *"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YknwvbQjC62F",
        "colab_type": "text"
      },
      "source": [
        "# Dataset:\n",
        "\n",
        "Our dataset today will be ImageWoof. [Link](https://github.com/fastai/imagenette)\n",
        "\n",
        "Goal: Using no pre-trained weights, see how well of accuracy we can get in x epochs\n",
        "\n",
        "This dataset is generally harder than imagenette, both are a subset of ImageNet.\n",
        "\n",
        "Models are leaning more towards being faster, more effecient\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "zHZANDslHpGq",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "def get_data(size, woof, bs, workers=None):\n",
        "    if   size<=128: path = URLs.IMAGEWOOF if woof else URLs.IMAGENETTE\n",
        "    elif size<=224: path = URLs.IMAGEWOOF_320 if woof else URLs.IMAGENETTE_320\n",
        "    else          : path = URLs.IMAGEWOOF     if woof else URLs.IMAGENETTE\n",
        "    path = untar_data(path)\n",
        "\n",
        "    n_gpus = num_distrib() or 1\n",
        "    if workers is None: workers = min(8, num_cpus()//n_gpus)\n",
        "\n",
        "    return (ImageList.from_folder(path).split_by_folder(valid='val')\n",
        "            .label_from_folder().transform(([flip_lr(p=0.5)], []), size=size)\n",
        "            .databunch(bs=bs, num_workers=workers)\n",
        "            .presize(size, scale=(0.35,1))\n",
        "            .normalize(imagenet_stats))"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "yImxYNeJH-AN",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "data = get_data(128, True, 64)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0Xpkh8REIct4",
        "colab_type": "text"
      },
      "source": [
        "We will be following a progression that started on the fastai forums [here](https://forums.fast.ai/t/meet-mish-new-activation-function-possible-successor-to-relu/53299/) on August 26th of this year.\n",
        "\n",
        "In this \"competition\" included:\n",
        "  * [Less](https://forums.fast.ai/u/lessw2020)\n",
        "  * [Seb](https://forums.fast.ai/u/seb)\n",
        "  * [Mikhail Grankin](https://forums.fast.ai/u/grankin)\n",
        "  * [Federico Lois](https://forums.fast.ai/u/redknight)\n",
        "  * [Ignacio Oguiza](https://forums.fast.ai/u/oguiza)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Ehv8AcGK65bK",
        "colab_type": "text"
      },
      "source": [
        "# The Competition:\n",
        "\n",
        "* Lasted roughly 3 days\n",
        "* We explored a variety of papers and combining various ideas to see what *together* could work the best"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "KrMPAy167OAr",
        "colab_type": "text"
      },
      "source": [
        "## Papers Referenced:\n",
        "\n",
        "* [Bag of Tricks for Resnet (aka the birth of xResNet)](https://arxiv.org/abs/1812.01187)\n",
        "* [Large Batch Optimization for Deep Learning, LAMB](https://arxiv.org/abs/1904.00962)\n",
        "* [Large Batch Training of Convolutional Networks, LARS](https://arxiv.org/pdf/1708.03888.pdf)\n",
        "* [Lookahead Optimizer: k steps forward, 1 step back](https://arxiv.org/abs/1907.08610)\n",
        "* [Mish: A Self Regularized Non-Monotonic Neural Activation Function](https://arxiv.org/abs/1908.08681v1)\n",
        "* [On the Variance of the Adaptive Learning Rate and Beyond, RAdam](https://arxiv.org/abs/1908.03265)\n",
        "* [Self-Attention Generative Adversarial Networks](https://arxiv.org/abs/1805.08318)\n",
        "* [Stochastic Gradient Methods with Layer-wise\n",
        "Adaptive Moments for Training of Deep Networks, Novograd](https://arxiv.org/pdf/1905.11286.pdf)\n",
        "\n",
        "\n",
        "## Other Equally as Important Noteables:\n",
        "* Flatten + Anneal Scheduling - Mikhail Grankin\n",
        "* Simple Self Attention - Seb"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "55fuRIRsIqxs",
        "colab_type": "text"
      },
      "source": [
        "One trend you will notice throughout this exercise is we (everyone mentioned above and myself) all tried combining a variety of these tools and papers together before Seb eventually came up with the winning solution. For a bit of context, here is the pre-competition State of the Art for ImageWoof:\n",
        "![](https://forums.fast.ai/uploads/default/original/3X/9/3/9386db85de3d7ad9c7d567484fb929bb40a93d85.jpeg)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ciDEdyPUJwf3",
        "colab_type": "text"
      },
      "source": [
        "And here was the winning results:\n",
        "\n",
        "![](https://forums.fast.ai/uploads/default/optimized/3X/a/6/a68876e6f99a87c8c81db6c39125f8f1eae99f1f_2_690x271.jpeg)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "2s9t11BkKGQ_",
        "colab_type": "text"
      },
      "source": [
        "As a general rule of thumb, we always want to make sure our results are reproducable, hence the multiple runs and reports of the Standard Deviation, Mean, and the Maximum found. For today, we will just do one run of five for time. Following no particular order, here is a list of what was tested, and what we will be testing today:\n",
        "\n",
        "* Baseline (Adam + xResnet50) + OneCycle\n",
        "* Ranger (RAdam + LookAhead) + OneCycle\n",
        "* Ranger + Flatten Anneal\n",
        "* Ranger + MXResnet (xResnet50 + Mish) + Flatten Anneal\n",
        "* RangerLars (Ralamb + LARS + Ranger) + Flatten Anneal\n",
        "* RangerLars + xResnet50 + Flatten Anneal\n",
        "* Ranger + SimpleSelfAttention + MXResnet + Flatten Anneal\n",
        "\n",
        "The last of which did achieve the best score overall."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "br_RmMD6QpAT",
        "colab_type": "text"
      },
      "source": [
        "## Functions:\n",
        "\n",
        "For the sake of simplicity, we will borrow from Seb's gitub repository."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "dLwNYYepRAAH",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 119
        },
        "outputId": "18da618b-0afa-4264-d7c4-3a353904ef76"
      },
      "source": [
        "!git clone https://github.com/sdoria/mish"
      ],
      "execution_count": 4,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Cloning into 'mish'...\n",
            "remote: Enumerating objects: 46, done.\u001b[K\n",
            "remote: Counting objects: 100% (46/46), done.\u001b[K\n",
            "remote: Compressing objects: 100% (40/40), done.\u001b[K\n",
            "remote: Total 46 (delta 21), reused 19 (delta 6), pack-reused 0\u001b[K\n",
            "Unpacking objects: 100% (46/46), done.\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "e3ou9mwDRbgk",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 51
        },
        "outputId": "37849893-97af-4be3-f96c-033dc5b21dc9"
      },
      "source": [
        "%cd mish\n",
        "from rangerlars import *\n",
        "from mish import *\n",
        "from mxresnet import *\n",
        "from ranger import *"
      ],
      "execution_count": 4,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "/content/mish\n",
            "Mish activation loaded...\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0l1-KBsiRlbb",
        "colab_type": "text"
      },
      "source": [
        "# Running the tests\n",
        "\n",
        "For our tests, we will use the overall accuracy as well as the top_k, as this is what was used in Jeremy's example. Do note that top_k is not quite as relevent here as we only have 10 classes"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YoUy5mNWRrDU",
        "colab_type": "text"
      },
      "source": [
        "## Baseline"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "vpDu6BIDR_2s",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "opt_func = partial(optim.Adam, betas=(0.9,0.99), eps=1e-6)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "G4qPOSmFRsGs",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "learn = Learner(data, models.xresnet50(c_out=10), wd=1e-2, opt_func=opt_func,\n",
        "               bn_wd=False, true_wd=True, loss_func=LabelSmoothingCrossEntropy(),\n",
        "               metrics=[accuracy, top_k_accuracy])"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "3oaAeTgbSVOu",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 204
        },
        "outputId": "1cf83d01-4929-4596-bca7-673395c742ed"
      },
      "source": [
        "learn.fit_one_cycle(5, 3e-3, div_factor=10, pct_start=0.3)"
      ],
      "execution_count": 7,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: left;\">\n",
              "      <th>epoch</th>\n",
              "      <th>train_loss</th>\n",
              "      <th>valid_loss</th>\n",
              "      <th>accuracy</th>\n",
              "      <th>top_k_accuracy</th>\n",
              "      <th>time</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <td>0</td>\n",
              "      <td>2.153507</td>\n",
              "      <td>2.155606</td>\n",
              "      <td>0.236000</td>\n",
              "      <td>0.764000</td>\n",
              "      <td>01:12</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>1</td>\n",
              "      <td>1.950127</td>\n",
              "      <td>2.465281</td>\n",
              "      <td>0.282000</td>\n",
              "      <td>0.788000</td>\n",
              "      <td>01:13</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>2</td>\n",
              "      <td>1.722233</td>\n",
              "      <td>1.586035</td>\n",
              "      <td>0.488000</td>\n",
              "      <td>0.932000</td>\n",
              "      <td>01:12</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>3</td>\n",
              "      <td>1.523372</td>\n",
              "      <td>1.403256</td>\n",
              "      <td>0.588000</td>\n",
              "      <td>0.952000</td>\n",
              "      <td>01:12</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>4</td>\n",
              "      <td>1.379958</td>\n",
              "      <td>1.315990</td>\n",
              "      <td>0.624000</td>\n",
              "      <td>0.970000</td>\n",
              "      <td>01:12</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>"
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {
            "tags": []
          }
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "maPDM39xWSJX",
        "colab_type": "text"
      },
      "source": [
        "## Ranger + OneCycle"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "S_JKRL4RWWVI",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "opt_func = partial(Ranger, betas=(0.9,0.99), eps=1e-6)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "nMJS945OWciJ",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "learn = Learner(data, models.xresnet50(c_out=10), wd=1e-2, opt_func=opt_func,\n",
        "               bn_wd=False, true_wd=True, loss_func=LabelSmoothingCrossEntropy(),\n",
        "               metrics=[accuracy, top_k_accuracy])"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "v5zn1DygWfIo",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 204
        },
        "outputId": "3cfe337e-eda6-42b6-a98f-66d9d29615f2"
      },
      "source": [
        "learn.fit_one_cycle(5, 3e-3, div_factor=10, pct_start=0.3)"
      ],
      "execution_count": 11,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: left;\">\n",
              "      <th>epoch</th>\n",
              "      <th>train_loss</th>\n",
              "      <th>valid_loss</th>\n",
              "      <th>accuracy</th>\n",
              "      <th>top_k_accuracy</th>\n",
              "      <th>time</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <td>0</td>\n",
              "      <td>1.897928</td>\n",
              "      <td>2.008069</td>\n",
              "      <td>0.312000</td>\n",
              "      <td>0.796000</td>\n",
              "      <td>01:12</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>1</td>\n",
              "      <td>1.791323</td>\n",
              "      <td>1.728758</td>\n",
              "      <td>0.418000</td>\n",
              "      <td>0.898000</td>\n",
              "      <td>01:13</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>2</td>\n",
              "      <td>1.669062</td>\n",
              "      <td>1.666615</td>\n",
              "      <td>0.478000</td>\n",
              "      <td>0.912000</td>\n",
              "      <td>01:13</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>3</td>\n",
              "      <td>1.570262</td>\n",
              "      <td>1.517981</td>\n",
              "      <td>0.548000</td>\n",
              "      <td>0.936000</td>\n",
              "      <td>01:12</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>4</td>\n",
              "      <td>1.525395</td>\n",
              "      <td>1.487397</td>\n",
              "      <td>0.548000</td>\n",
              "      <td>0.938000</td>\n",
              "      <td>01:12</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>"
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {
            "tags": []
          }
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "LG3ETMSVYh0c",
        "colab_type": "text"
      },
      "source": [
        "## Ranger + Flatten Anneal"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "d-hQoUBkZBjS",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "from fastai.callbacks import *"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Da-Nsip8YkyG",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "def flattenAnneal(learn:Learner, lr:float, n_epochs:int, start_pct:float):\n",
        "  n = len(learn.data.train_dl)\n",
        "  anneal_start = int(n*n_epochs*start_pct)\n",
        "  anneal_end = int(n*n_epochs) - anneal_start\n",
        "  phases = [TrainingPhase(anneal_start).schedule_hp('lr', lr),\n",
        "           TrainingPhase(anneal_end).schedule_hp('lr', lr, anneal=annealing_cos)]\n",
        "  sched = GeneralScheduler(learn, phases)\n",
        "  learn.callbacks.append(sched)\n",
        "  learn.fit(n_epochs)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "rtllGYw1Znca",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "opt_func = partial(Ranger, betas=(0.9,0.99), eps=1e-6)\n",
        "learn = Learner(data, models.xresnet50(c_out=10), wd=1e-2, opt_func=opt_func,\n",
        "               bn_wd=False, true_wd=True, loss_func=LabelSmoothingCrossEntropy(),\n",
        "               metrics=[accuracy, top_k_accuracy])"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "GYr2ZXxcZjz7",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 204
        },
        "outputId": "6679f3f6-de3e-4f07-dfd4-1b223fb9a550"
      },
      "source": [
        "flattenAnneal(learn, 3e-3, 5, 0.7)"
      ],
      "execution_count": 21,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: left;\">\n",
              "      <th>epoch</th>\n",
              "      <th>train_loss</th>\n",
              "      <th>valid_loss</th>\n",
              "      <th>accuracy</th>\n",
              "      <th>top_k_accuracy</th>\n",
              "      <th>time</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <td>0</td>\n",
              "      <td>2.098281</td>\n",
              "      <td>2.399817</td>\n",
              "      <td>0.266000</td>\n",
              "      <td>0.744000</td>\n",
              "      <td>01:13</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>1</td>\n",
              "      <td>1.929641</td>\n",
              "      <td>2.321711</td>\n",
              "      <td>0.302000</td>\n",
              "      <td>0.798000</td>\n",
              "      <td>01:13</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>2</td>\n",
              "      <td>1.733089</td>\n",
              "      <td>1.623181</td>\n",
              "      <td>0.506000</td>\n",
              "      <td>0.920000</td>\n",
              "      <td>01:13</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>3</td>\n",
              "      <td>1.582382</td>\n",
              "      <td>1.617398</td>\n",
              "      <td>0.496000</td>\n",
              "      <td>0.924000</td>\n",
              "      <td>01:13</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>4</td>\n",
              "      <td>1.361795</td>\n",
              "      <td>1.311129</td>\n",
              "      <td>0.670000</td>\n",
              "      <td>0.952000</td>\n",
              "      <td>01:13</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>"
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {
            "tags": []
          }
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "LkNMMZTtctc4",
        "colab_type": "text"
      },
      "source": [
        "## Ranger + MXResnet + Flatten Anneal"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "ghwm7gIYcy_E",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "opt_func = partial(Ranger, betas=(0.9,0.99), eps=1e-6)\n",
        "learn = Learner(data, mxresnet50(c_out=10), wd=1e-2, opt_func=opt_func,\n",
        "               bn_wd=False, true_wd=True, loss_func=LabelSmoothingCrossEntropy(),\n",
        "               metrics=[accuracy, top_k_accuracy])"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "IBw2Sz1Ec8zU",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 204
        },
        "outputId": "59793100-e8b9-4e72-b441-6c9e94187309"
      },
      "source": [
        "flattenAnneal(learn, 4e-3, 5, 0.7)"
      ],
      "execution_count": 25,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: left;\">\n",
              "      <th>epoch</th>\n",
              "      <th>train_loss</th>\n",
              "      <th>valid_loss</th>\n",
              "      <th>accuracy</th>\n",
              "      <th>top_k_accuracy</th>\n",
              "      <th>time</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <td>0</td>\n",
              "      <td>2.054917</td>\n",
              "      <td>2.199414</td>\n",
              "      <td>0.286000</td>\n",
              "      <td>0.804000</td>\n",
              "      <td>01:17</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>1</td>\n",
              "      <td>1.804237</td>\n",
              "      <td>3.025912</td>\n",
              "      <td>0.254000</td>\n",
              "      <td>0.732000</td>\n",
              "      <td>01:17</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>2</td>\n",
              "      <td>1.616225</td>\n",
              "      <td>1.517143</td>\n",
              "      <td>0.574000</td>\n",
              "      <td>0.944000</td>\n",
              "      <td>01:17</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>3</td>\n",
              "      <td>1.449524</td>\n",
              "      <td>1.379319</td>\n",
              "      <td>0.622000</td>\n",
              "      <td>0.938000</td>\n",
              "      <td>01:18</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>4</td>\n",
              "      <td>1.221281</td>\n",
              "      <td>1.168319</td>\n",
              "      <td>0.728000</td>\n",
              "      <td>0.958000</td>\n",
              "      <td>01:18</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>"
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {
            "tags": []
          }
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xePStNUtfIvx",
        "colab_type": "text"
      },
      "source": [
        "## RangerLars + MXResnet + Flatten Anneal"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "53D4GKcefNn-",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "opt_func = partial(RangerLars, betas=(0.9,0.99), eps=1e-6)\n",
        "learn = Learner(data, mxresnet50(c_out=10), wd=1e-2, opt_func=opt_func,\n",
        "               bn_wd=False, true_wd=True, loss_func=LabelSmoothingCrossEntropy(),\n",
        "               metrics=[accuracy, top_k_accuracy])"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "LbHTrlnhfR0e",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 204
        },
        "outputId": "f751b735-b335-4cc3-d964-72897e80e0dd"
      },
      "source": [
        "flattenAnneal(learn, 4e-3, 5, 0.72)"
      ],
      "execution_count": 34,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: left;\">\n",
              "      <th>epoch</th>\n",
              "      <th>train_loss</th>\n",
              "      <th>valid_loss</th>\n",
              "      <th>accuracy</th>\n",
              "      <th>top_k_accuracy</th>\n",
              "      <th>time</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <td>0</td>\n",
              "      <td>1.945484</td>\n",
              "      <td>2.401585</td>\n",
              "      <td>0.318000</td>\n",
              "      <td>0.780000</td>\n",
              "      <td>01:31</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>1</td>\n",
              "      <td>1.714290</td>\n",
              "      <td>1.956744</td>\n",
              "      <td>0.384000</td>\n",
              "      <td>0.844000</td>\n",
              "      <td>01:31</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>2</td>\n",
              "      <td>1.587898</td>\n",
              "      <td>1.804619</td>\n",
              "      <td>0.416000</td>\n",
              "      <td>0.906000</td>\n",
              "      <td>01:31</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>3</td>\n",
              "      <td>1.502960</td>\n",
              "      <td>1.540900</td>\n",
              "      <td>0.536000</td>\n",
              "      <td>0.928000</td>\n",
              "      <td>01:31</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>4</td>\n",
              "      <td>1.361548</td>\n",
              "      <td>1.342270</td>\n",
              "      <td>0.646000</td>\n",
              "      <td>0.954000</td>\n",
              "      <td>01:31</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>"
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {
            "tags": []
          }
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Q9vO4lTZhL-S",
        "colab_type": "text"
      },
      "source": [
        "## RangerLars + xResnet50 + Flatten Anneal"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "bkXiMbRshSnG",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "opt_func = partial(RangerLars, betas=(0.9,0.99), eps=1e-6)\n",
        "learn = Learner(data, models.xresnet50(c_out=10), wd=1e-2, opt_func=opt_func,\n",
        "               bn_wd=False, true_wd=True, loss_func=LabelSmoothingCrossEntropy(),\n",
        "               metrics=[accuracy, top_k_accuracy])"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "oS6GGZU0hZvl",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 204
        },
        "outputId": "0a80bd3e-6c42-4794-880e-fe972d1a7031"
      },
      "source": [
        "flattenAnneal(learn, 4e-3, 5, 0.72)"
      ],
      "execution_count": 36,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: left;\">\n",
              "      <th>epoch</th>\n",
              "      <th>train_loss</th>\n",
              "      <th>valid_loss</th>\n",
              "      <th>accuracy</th>\n",
              "      <th>top_k_accuracy</th>\n",
              "      <th>time</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <td>0</td>\n",
              "      <td>2.006913</td>\n",
              "      <td>2.315927</td>\n",
              "      <td>0.284000</td>\n",
              "      <td>0.708000</td>\n",
              "      <td>01:22</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>1</td>\n",
              "      <td>1.801610</td>\n",
              "      <td>1.927570</td>\n",
              "      <td>0.378000</td>\n",
              "      <td>0.852000</td>\n",
              "      <td>01:22</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>2</td>\n",
              "      <td>1.703221</td>\n",
              "      <td>1.858955</td>\n",
              "      <td>0.394000</td>\n",
              "      <td>0.880000</td>\n",
              "      <td>01:22</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>3</td>\n",
              "      <td>1.643228</td>\n",
              "      <td>1.700991</td>\n",
              "      <td>0.448000</td>\n",
              "      <td>0.872000</td>\n",
              "      <td>01:22</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>4</td>\n",
              "      <td>1.488607</td>\n",
              "      <td>1.462179</td>\n",
              "      <td>0.594000</td>\n",
              "      <td>0.940000</td>\n",
              "      <td>01:22</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>"
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {
            "tags": []
          }
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ryhJNYQViFPA",
        "colab_type": "text"
      },
      "source": [
        "## Ranger + SimpleSelfAttention + MXResnet + Flatten Anneal"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "_nRLZJmdiJVJ",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "opt_func = partial(Ranger, betas=(0.95,0.99), eps=1e-6)\n",
        "learn = Learner(data, mxresnet50(c_out=10, sa=True), wd=1e-2, opt_func=opt_func,\n",
        "               bn_wd=False, true_wd=True, loss_func=LabelSmoothingCrossEntropy(),\n",
        "               metrics=[accuracy, top_k_accuracy])"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "RiWGHmgZldOa",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 204
        },
        "outputId": "bd3ef8e1-a4e2-4f68-c908-c60cb63bc712"
      },
      "source": [
        "flattenAnneal(learn, 4e-3, 5, 0.72)"
      ],
      "execution_count": 38,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: left;\">\n",
              "      <th>epoch</th>\n",
              "      <th>train_loss</th>\n",
              "      <th>valid_loss</th>\n",
              "      <th>accuracy</th>\n",
              "      <th>top_k_accuracy</th>\n",
              "      <th>time</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <td>0</td>\n",
              "      <td>1.968602</td>\n",
              "      <td>2.051655</td>\n",
              "      <td>0.330000</td>\n",
              "      <td>0.830000</td>\n",
              "      <td>01:20</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>1</td>\n",
              "      <td>1.709247</td>\n",
              "      <td>2.174664</td>\n",
              "      <td>0.384000</td>\n",
              "      <td>0.892000</td>\n",
              "      <td>01:21</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>2</td>\n",
              "      <td>1.537062</td>\n",
              "      <td>1.451682</td>\n",
              "      <td>0.598000</td>\n",
              "      <td>0.946000</td>\n",
              "      <td>01:21</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>3</td>\n",
              "      <td>1.393161</td>\n",
              "      <td>1.419797</td>\n",
              "      <td>0.578000</td>\n",
              "      <td>0.954000</td>\n",
              "      <td>01:21</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>4</td>\n",
              "      <td>1.172150</td>\n",
              "      <td>1.122868</td>\n",
              "      <td>0.746000</td>\n",
              "      <td>0.978000</td>\n",
              "      <td>01:20</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>"
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {
            "tags": []
          }
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "wBM72p6cnHtn",
        "colab_type": "text"
      },
      "source": [
        "As we can see, 74.6 is what we got. The highest recorded is 78%. \n",
        "\n",
        "From here:\n",
        "\n",
        "I encourage you all to try out some of the combinations seen here today and apply a bit more to it. For instance, are we using the best hyperparameters? What about Cut-Out? MixUp? Plenty more to explore!"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "jIjkr4zmn3Fl",
        "colab_type": "text"
      },
      "source": [
        "Thanks to everyone mentioned above for their hard work and determination to getting to where we are now. The fastai forum is an amazing place to bounce ideas and try new things. Also thank you to Jeremy for making *all* of this possible!"
      ]
    }
  ]
}